AITopics | model summary

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Staffordshire (0.04)
Asia > China (0.04)
(6 more...)

Genre: Research Report (0.68)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 18:04:34 GMT

1f89885d556929e98d3ef9b86448f951-Supplemental.pdf

accuracy, coherence, labeler, (15 more...)

Country:

Asia > Middle East > Israel (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation (0.68)
Media (0.46)
Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsOct-8-2025, 22:02:32 GMT

7480ed13740773505262791131c12b89-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Staffordshire (0.04)
Asia > China (0.04)
(6 more...)

Genre: Research Report (0.68)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Neural Information Processing SystemsOct-2-2025, 09:57:11 GMT

Appendix Table of Contents

'subreddit whitelist' (see Table 2 for the distribution over subreddits), any post where the title starts with some variant of'Edit' or'Update',

artificial intelligence, labeler, machine learning, (18 more...)

Country:

Asia > Middle East > Israel (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation (0.68)
Media (0.46)
Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Parmar, Mihir, Deilamsalehy, Hanieh, Dernoncourt, Franck, Yoon, Seunghyun, Rossi, Ryan A., Bui, Trung

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

arXiv.org Artificial IntelligenceJul-5-2024

Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries. We utilize this dataset for aligning LLMs through supervised fine-tuning with natural language human feedback to enhance the coherence of their generated summaries. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (~10% Rouge-L) in terms of producing coherent summaries. We further utilize human feedback to benchmark results over instruction-tuned models such as FLAN-T5 which resulted in several interesting findings. Data and source code are available at https://github.com/Mihir3009/Extract-AI.

coherence, dataset, summarization, (14 more...)

2407.04855

Country:

South America > Venezuela (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(9 more...)

Genre:

Research Report (0.50)
Overview (0.48)

Industry: Leisure & Entertainment > Sports > Olympic Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Peychev, Momchil, Müller, Mark Niklas, Fischer, Marc, Vechev, Martin

Automated Classification of Model Errors on ImageNet

arXiv.org Artificial IntelligenceNov-13-2023

While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress. To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist. Recent work in this direction employed a panel of experts to manually categorize all remaining classification errors for two selected models. However, this process is time-consuming, prone to inconsistencies, and requires trained experts, making it unsuitable for regular model evaluation thus limiting its utility. To overcome these limitations, we propose the first automated error classification framework, a valuable tool to study how modeling choices affect error distributions. We use our framework to comprehensively evaluate the error distribution of over 900 models. Perhaps surprisingly, we find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the portion of all error types. In particular, we observe that the portion of severe errors drops significantly with top-1 accuracy indicating that, while it underreports a model's true performance, it remains a valuable performance metric.

accuracy, artifact, pred, (15 more...)

2401.0243

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Staffordshire (0.04)
Asia > China (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shaib, Chantal, Li, Millicent L., Joseph, Sebastian, Marshall, Iain J., Li, Junyi Jessy, Wallace, Byron C.

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

arXiv.org Artificial IntelligenceMay-11-2023

Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generated by GPT-3, given zero supervision. We consider both single- and multi-document settings. In the former, GPT-3 is tasked with generating regular and plain-language summaries of articles describing randomized controlled trials; in the latter, we assess the degree to which GPT-3 is able to \emph{synthesize} evidence reported across a collection of articles. We design an annotation scheme for evaluating model outputs, with an emphasis on assessing the factual accuracy of generated summaries. We find that while GPT-3 is able to summarize and simplify single biomedical articles faithfully, it struggles to provide accurate aggregations of findings over multiple documents. We release all data and annotations used in this work.

intervention, large language model, machine learning, (22 more...)

2305.06299

Country:

South America > Colombia > Bogotá D.C. > Bogotá (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Republic of Türkiye > Antalya Province > Antalya (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-6-2023, 11:10:26 GMT

Producing insights with Generalized Additive Models (GAMs)

Today we are going to learn how to use Generalized Additive Models to predict the number of bicycles rented in Washington D.C. between 2011 and 2012. This dataset was provided by the bike-sharing company: Capital Bikeshare. Bike-sharing systems are a new generation of service that allows users to pick up and drop off bicycles at convenient locations. Thus, promoting zero-emission transportation that has positive effects on traffic, the environment, and health issues. "A generalized additive model is a generalized linear model with a linear predictor involving a sum of smooth functions of covariates" (Wood, 2017).

artificial intelligence, bike, machine learning, (16 more...)

#artificialintelligence

Country: North America > United States > District of Columbia > Washington (0.25)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceSep-2-2020

Learning to summarize from human feedback

Stiennon, Nisan, Ouyang, Long, Wu, Jeff, Ziegler, Daniel M., Lowe, Ryan, Voss, Chelsea, Radford, Alec, Amodei, Dario, Christiano, Paul

As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.

labeler, machine learning, reinforcement learning, (19 more...)

2009.01325

Country:

Asia > Middle East > Israel (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment (0.93)
Transportation (0.68)
Education > Educational Setting (0.46)
Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

AAAI ConferencesApr-19-2016

PEAK: Pyramid Evaluation via Automated Knowledge Extraction

Yang, Qian (Tsinghua University) | Passonneau, Rebecca J. (Columbia University) | Melo, Gerard de (Tsinghua University)

Evaluating the selection of content in a summary is important both for human-written summaries, which can be a useful pedagogical tool for reading and writing skills, and machine-generated summaries, which are increasingly being deployed in information management. The pyramid method assesses a summary by aggregating content units from the summaries of a wise crowd (a form of crowdsourcing). It has proven highly reliable but has largely depended on manual annotation. We propose PEAK, the first method to automatically assess summary content using the pyramid method that also generates the pyramid content models. PEAK relies on open information extraction and graph algorithms. The resulting scores correlate well with manually derived pyramid scores on both human and machine summaries, opening up the possibility of wide-spread use in numerous applications.

model summary, pyramid, scus, (15 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > United Kingdom > Wales (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(4 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.85)